DOCUMENT RESUME 



TM 001 158 

Jackson, Douglas N. ; And Others 

An Evaluation of Forced-Choice and True-False Item 
Formats in Personality Assessment, 

Educational Testing Service, Princeton, N.J. 

RB-71-67 
Dec 7 1 

26p. 

MF-SO.65 HC-$3,29 

Behavior Rating scales; College Housing; ^College 
Students; Comparative Analysis; correlation; ^Forced 
Choice Technique; Multiple Choice Tests; Peer 
Relationship; ^Personality Assessment; Personality 
Tests; Response Mode; >^Response Style (Tests) ; Self 
Evaluation; =^Test Bias; Test Reliability; Tests; Test 
Validity 

^Personality Research Form; PRF 



In a comparative evaluation of a standard true— false 
format for personality assessment and a forced-choice format, 
subjects from college residential units were assigned randomly to 
respond either to the forced-choice or standard true-false form of 
the Personality Research Form (PRF) . All subjects also rated 
themselves and the members of their residential units on behavior 
rralts eorresponding to the PRF scales. Reliabilities of the scales 
comprising the true— false form were substantially higher than those 
in the forced-choice form, peer rating validities for the true-false 
and forced-choice forms were in a comparable range, but correlations 
with self— ratings were higher for the true— false form. Results do not 
support the contention that for personality scales a forced-choice 
format is consistently more valid than a standard format, considering 
the other advantages of the true-false format, including its freedom 
from the complicating effects of ipsa tive scores, the use of this 
format is recommended for the great majority of applications in 
personality assessment. (Author) 



ED 061 256 

AUTHOR 

TITLE 

INST ITUTION 
REPORT NO 
PUB DATE 
NOTE 

EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 

ABSTRACT 



O 

ERIC 



rB-71-67 



v0 

tr* 





AM EVALUATION OF FORCED-CHOICE AND TRUE-FALSE 



ITEM FORMATS IN PERSONALITY ASSESSIffiNT 



Douglas M* Jackson 
University of Western Ontario 

John A,. Weill 
University of Guelph 

and 

Ann R. Sevan 
Brook University 



DiPARTMiNT OF HEALTH, 
EDUCATION & WELFARE 
OFFICE ©F EDUCATION 
THIS POaJMENT HAS -afeEH REPRO- 
DUCED EXACTOi^ AS RECEIVED FROM 
THE PERSON OR 0R(SANIZATlON ORIG- 
INATING IT. POINTS OF VIEW OR OPIN- 
IONS STATED DO NOT NiCiSSAR'LV 
REPRESENT OFriCIAL OFFICE OF EDU- 
CATION POSITION OR POLICY. 



CO 

lO 

■rH 



o 

O 



This Bulletin is a drs,ft for interoffice circulation. 
Corrections and suggestions for revision are solicited. 
The Bulletin should not be cited as a reference without 
the specific permission of the authors * It is automatic 
cally superseded upon fora.al publication of the material* 



p£'‘fS 

E-» 

o 

ERIC 



Educational Testing Service 
Princeton^ lew Jersey 
December 1971 



1 



An Evaluation of Forced-Choice and True-False 



Item Formats in Personality Assessment 



Douglas N. Jackson 



John A, Neill 



University of Western Ontario 



University of Guelph 



and 



Ann R . Sevan 



Brock University 



Abstract 



In a comparative evaluation of a standard true-false format for person- 
ality assessment and a forced-choice format, subjects from college residential 
miits were assigned randomly to respond either to the forced-choice or 
standard true-false fomi of the Personality Research Form (PRF). All subjects 
also rated themselves and the members of their residential units on behavior 
traits corresponding to the PRF scales. Reliabilities of the scales eom— 
prising the true— false form were substantially higher than those in the forced- 
choice form. Peer rating validities for the true-false and forced-choice •' 
forms were in a comparable range ^ but correlations with self-ratings were 
higher for the true-false tOTm. Results do not support the contention that 
for personality scales a forced-choice format is consistently more valid 



than a standard format. Considering the other advantages of the true-false 



format, including its freedom from the complicating effects of ipsative scores, 
the use of this format is recommended for the great majority of applications 
in personality assessment. 








z 



An Evaluation of Forced-^Choice and True-False 



1 2 

Item Formats in Personality Assessment ^ 

Douglas N. Jackson John A- Neill 

University of Western Ontario University of Guelph 

and 

Ann R . Bevan 
Brock University 

The primary purpose of the forced=choiee technique in personality assess- 
ment, according to its adh‘ rents , is to reduce bias in response to items 
(Edwards, 1953s 195^5 Gordon, 1951)- In essence, the forced-choice method, as 
the term is employed in this paper, consists of pairing a self-descriptive 
statement pertaining to a personality trait with a trait-irrelevant filler 
statement having a very similar index of favorableness. The subject Is asked 
to choose the statement which Is more charaGteristic of himself. 

Item parameters based on both desirability scale values and item popu- 
larities have been used as the favorableness index for matching purposes , 
Heineman (1953) and Euwarde (195^5 1957) s example, have preferred match-- 
ing items on the basis of desirability scale values, while Jackson suid Payne 
{1963) preferred matching Items on. item popiilarities . The rationale for the 
former is that if a subject is forced to choose between items matched on 
desirability scale value, he cannot respond in terms of the desirability of 
items, and therefore is more likely to respond to the content of items. The 
result should be a reduction in the influence of desirability response style, 
increased resistance to faking, and consequent higher scale validity. Match- 
ing on item popularity, in addition to reducing the influence of response 



ERIC 



3 



- 2 - 



styles 5 has the added advantage that the expected popularity of each forced- 
choice item should be close to .50s no matter how extreme the popularities of 
the original statements were. Consequently, the matching procedures produce 
an increase in item and scale variance with a subsequent Increase in scale 
reliability (see Magnusson, 196? » PP^ 53”-77)- Because of the relatively high 
correlation between the item popularity ^d desirability scale value (Edwards, 
1953), however, the two methods of pairing probably yield scales with similar 
properties 5 although the research has indicated that forced-choice scales often 
do have higher reliabilities than their nonforced counterparts; for example, 
Jackson and Payne (1963) reported that reliability increased from ,8l to .96 
when a forced=^choice format was used instead of a standard single stimulus 
format . 

On the subject of validity, the research indicates that neither nonforced 
nor forced-choice items have a clear advantage (cf* Borislow, 1958; lisard & 
Rosenberg, 1958; Krug, 1958; Longstaff ^ J-urgensen, 1953| Maher, 1959| Mais, 1951; 
Norman, I 963 ; Ruamore, I 956 ; Scott, 1968; Waters & Wherry, 1962; Winters, Bartlett, 
& Leva, 1965 )^ Furthermore, it has become very clear that matching statements 
on desirability scale value does not prevent people from reliably Judging one 
member of the forced-choice pair to be more desirable than the other (Corah, 
Feldman, Cohen, Gruen, Meadow, & Ringwall, 1958; Edwards, Wright, 8c Lunneborg, 

1959; Feldman & Corah, I960; Saltz, Reece, & Ager, I 962 ). Apparently placing 
statements in the forced-choice context accentuates subtle differences in the de- 
sirability of items (see Corah et ali, 1958; Feldman & Corah, 196 O; La Pointe §g 
Auclair, 196l)* 

Although many studies have been undertaken to compare true-false and 
forced-choice item forTnats , they have been fraught with difficulties (Scott, 

1968 ). One problem in assessing the existing research comparing true-false 







- 3 - 



and forced-choice formats is that many of the instruments have been composed 
of unselected samples of Items, a procedure which is hardly Justified in view 
of modern developments in test theory and computer analysis (Neill & Jackson, 
1970), Rather, recent recommendations would emphasize the usefulness of a 
variety of strategies for selecting items to maximize content saturation and 
minimize sources of bias. Thus, an appropriate investigation within this 
perspective would be to evaluate the advantages of a forced—choice format 
after self— descriptive statements had been carefully selected for content 
saturation and freedom from bias. If the role of desirability bias has al= 
ready been greatly reduced in item selection, the question remains as to the 
extent to which the forced-choice format might enhance validity. Another 
problem that makes existing research difficult to assess is that comparisons 
have often been made between forced-choice and true--false scales that did not 
contain the same items. In such cases, differences between scales could at 
least partially be attributed to differences in samples of items . A further 
problem in assessing existing research is that the studies have often been 
conducted on a very narrow range of content , often on only one or two dimen- 
sions of personality. 

The present study seeks to remedy these problems by comparing a set of 
forced— choice scales with a parallel set of true-false scales covering a large 
range of personality dimensions. For each dimension the statements are iden- 
tical in the true-false version and in the forced-choice version. Furthermore, 
unlike many previous studies, which have limited their comparisons to reli- 
ability, the present study extends the comparisons between scales into the 
area of their validity with respect to behavior rating criteria. In the course 
of this investigation we will have occasion to examine properties of the behavior 




3 



^ 4 ^ 



rating criteria, particularly the effects of degree of acquaintance of another 
person upon the validity and differentiation of ratings of that person* 



Method 



Subjects 

Subjects, a total of 2l6 female university student volunteers, were 
drawn from 13 residential groups, each consisting of one wing of a large 
women’s residence, C^enty-six women lived in each wing; the number of volun- 
teer participants from each wing ranged from 13 to 23* Subjects in 12 of the 
13 units had been living together for at least seven months, and in the re- 
maining unit for three and one-half months. 



Experimental Measures 

Personality Besearch Form , Form Ak (Jackson, 1967) of this personality 
questionnaire consists of 44o self-report statements yielding scores for 
20 personality traits in the tradition of Murray (1938) , as well as for 
two validity scales, infrequency and desirability. The standard instructions 
used in the present study indicated that the subject was to decide whether or 



not each item was characteristic of her, and then to answer true or faUse on 



a separate answer sheet , 

In addition, a special experimental forced-choice form of the PKP (Form C) 



was constructed from the statements comprising Form AA to measure the same 20 



traits, A statement from each of the 20 scales was paired with a second 



statement from one of the other scales, with the restriction that no more than 
two pairs of statements were comprised of statements from the same two scales . 
For almost every trait 19 of its statements were paired with statements repre- 
senting the other 19 traits, one for each scale. Statements were paired on 




6 



- 5 " 



the iDasis of similar endorsement proportions , the difference between proper-- 
tions of paired statements being in almost every case no greater than *02. 
Statements not paired with other keyed statements were paired with one of 50 
irrelevant filler statements. Because of the special nature of the character- 
istics measured by the Infrequency and Desirability scales, positively keyed 
items for each of these scales were paired with negatively keyed items from 
the same scale. This procedure yielded a total of 2k"J item pairs. The sub- 
ject was instructed to choose which statement of each pair was more character^ 
istic of her, and to indicate her choice (A or B) on a separate answer sheet. 

Although most statements representing a given personality dimension were 
paired with statements keyed on other scales , the item keying was not strictly 
ipsative* The analytical problems uncovered for ipsative measures (Radcliffe, 
1963 , 1965 ; Strieker, 1965) will thus not hold for these items. One of the 
purposes of the present investigation is to evaluate the extent to which par- 
tial ipsatlzation will allow analytical treatment of forced-choiGe results. 

Behavior rating questionnaire . Subjects were requested to complete a 
behavior rating schedule with respect to 20 behavior traits on which they 
rated themselves ^ as well as every member of their residential group* Each 
trait was designated by an adjective and an accompanying definition selected 
carefully to represent each of the 20 scales on the PRF, The technique used 
was a refinement of on# adapted by Jackson ( 1967)5 Jackson and Guthrie (1968), 
and Kusyszyn and Jackson (1968) from the work of Campbell, Miller, Lubetsky, 
and O’Connell (196U). A nine-point scale was used for all ratings, ranging 
from "nine” (extremely Gharacteristic ) , through ”5” (neutral), to "1" 

(extremely uncharacteristic ) , In order to appraise the effects of degree of 
acquaintance, a rating on a nine-point scale of how well a subject knew each 
member of her group was obtained, with a rating of "9" defined as knowing the 
Q resident "extremely well" and a ”1" as "don’t know her at all." 




7 



^ 6 - 



Frocedure 



Su'bjec'fcs vi"bhin each residen'tial group were randomly divided in"to "two 
sets, the first to be administered PRF Form AA, and the second FRF Foimi C, 

Form AA was completed by 98 subjects and the forced=choice form by ll 8 . Upon 
completion of the PRF, the behavior rating questionnaire was distributed and 
completed. The full session lasted about two hours. 

Data Reduction and Analysis 

The PRF data were scored in the usual fashion, by counting for each sub-^ 
jeet the mmiber of responses in the keyed direction for a scale ( Jackson, 1967) . 
This yielded 20 content scores per subject. The 20 self-ratings per subject 
were in their final form 5 requiring no further reduction. 

Reduction of the peer rating data was more complex, since each subject 
had rated 13 to 22 of her peers. Working with the data of one intact residen- 
tial group at a time, a set of 20 mean peer ratings was computed for each 
subject , 

The foregoing procedure produced 62 scores per subject, 22 PRF scores, 20 
ssli'^ratings 5 and 20 mean peer ratings. The 62 scores from the sample of sub- 
jects who took the PRF Form AA and from the sample who took Form G were inter- 
correlated separately to produce two multitrait-multimethod matrices (Campbell 
& Fiske, 1959 )* 

For pwposes of computing the reliability of the peer ratings, ratings per- 
taining to a given subject were alternately used In computing two additional 
mean ratings per trait. Essentially the new mean ratings so formed were random 
split-half mean ratings, based on two separate subsets of Judges, Therefore, 
the correlation between them was corrected for double the mmxber of raters by 




8 



- 7 - 

the Spearman-Brown formulas giving the reliahillty coefficient of the peer 
ratings based on its generalizability to a population of Judges using a fixed 
rating scale. 



Results 

In the following paragraphs , the properties of the true-false and forced- 
choice scales are examined in the context of analysis of mult itr ait ^multimet hod 
matrices* Each matrix involves the measurement of 20 traits by each of three 
methods 5 the PRF, self-ratings, and peer ratings* A separate matrix was com- 
puted for the true=false sample and for the forced-choice sample* Comparisons 
are made between forced-choice and true-false scales in terms of the usual 
scale properties of internal consistenGy and convergent validity, but emphasis 
is placed on examination of discriminant reliability and validity* In addition, 
an examination is made of differences between forced-choice and true-false scale 
means. Finally, the effects on scale properties of degree of rater aeguain- 
tance with the ratee were examined. 

Peer Ratings 

In the present study peer ratings and self-ratings on the 20 traits cor- 
responding to the 20 PHF content scales were used as criteria for assessing 
the relative validity of true-false and forced-choice scales. Therefore, it 
is appropriate to present the reliabilities of the peer ratings. The reliabil- 
ities, based on means for each subject derived from split halves of Judges, were 
within an acceptable range, ranging from .58 to * 92 , with a median of .85 in 
one sample and . 86 in the other sample . 




- 8 - 



Although reliahillties were substantial, the judges illustrated poor dis- 
crimination among the various traits. The fact that many of the mean peer 
ratings were highly intercorrelated indicates that the Judges were basing their 
ratings on fewer than the 20 dimensions involved. The extent of the problem 
is Illustrated by the fact that 10 per cent of the correlations among peer 
ratings were equal to or greater than .60. The discriminant properties of 
peer ratings were considerably improved when calculations were based only on 
the ratings of peers who indicated a higher than average degree of acquain- 
tance with the ratee , as is indicated below. Nevertheless, the overall weal; 
evidence for the discriminant reliability of the peer ratings should be borne 
in mind in considering the validity of the questionnaire data. 

Comparison of True-False and Forced-Choice Scales 

Siimmarv statistics and reliability . When statements are paired on item 
popularity, the resulting forced— choice item has an expected popularity of .50. 
Therefore, the expected mean for a 20-item scale is 10. Furthermore, in a 
fully Ipsatlzed set of scales, a given subject must have a mean score of 10 
across scales. This combination of conditions, therefore, was expected to 
restrict the range of means for the scales in the partially ipsatlzed forced- 
choice version. In fact. Form C had a mean of 9.6 across scale means, close 
to the expected mean of 10.0, and a range of mean scores from 7-6 for Dominance 
to 12.8 for Nurturance. The true-false form. Form AA, had a mean across 
scales of 10.6, but had a larger range of mean scale scores, ranging from 5*2 
for Aggression to l6.6 for Affiliation. The scales with the highest and lowest 
means in Fomn C were different from the scales with the highest and lowest 
means respectively in Form AA, although there was a substantial correlation 
the two sets of means. 




10 



=9= 



The I®“20 reliabilities for the 20 scales in Form AA and the 20 scales 
in Form C are presented in Table 1^ For 19 of the 20 scales the reliability 

Insert Table 1 about here 

was higher for Form AA than for Form C- In Form AA the mean reliability was 
.75 with a range of .UU to .86 (somewhat lower than those reported for Form 
AA in the PRF Manual), while in Form C the mean reliability was -53 with a 
range from .39 to .71* The range of reliabilities is smaller for the forced- 
choice form, a fact probably attributable to the partial ipsatisation pro- 
cedure. The marked differences in reliability between the forced-choiee and 
true-false foimis bears very directly on the interpretation of differences 
between the validities of the respective scales. 

In order to Interpret the reliabilities , one must look not only at the 
absolute size of the reliabilities, but at the size of the reliabilities 
relative to the correlations among the scales in the respective forms (Campbell 
6c Fiske, 1959 ). For both forms there was a good degree of discriminant re- 
liability. In Form C only two scales had reliabilities which were reached or 
exceeded by correlations with other scales. Form AA had none. 

Relative validity of true-false and forced^choice scales . Table 1 lists 
the correlations between all scales and the corresponding peer ratings. Examina- 
tion of Table 1 reveals that there were no clearout differences in validity 
between the Form AA scales and Form C scales. For Form AA, 12 of the 20 scales 
were signifiGantly correlated (p < .05) with peer ratings, while 15 from Form 
C were significantly correlated with peer ratings. In both Form AA and Form C 
the range of peer rating validities was from 0 to . 5^3 with means of approxi- 
mately *30* 

The situation with self-ratings was slightly different. On Form AA all 
^ 20 scales were significantly correlated with self-ratings; on Form C, 19 scales 

ERIC 




= 10 = 



were significantly correlated with self-ratings. The mean self-crating validity 
coefficients for Form AA and Form C were , U 7 and -35s respectively. Eighteen 
out of the 20 scales had higher self-rating validity eoefficients on Form AA 
tham on Form C , 

It would appear that the two foimiats of the PRF are essentially similar in 
terms of the imcorreeted validity coefficients found in this study , and that 
there is little basis for choosing one or another based on their ability to 
predict the behavior ratings. It should be remembered s however ^ that the re- 
liabilities were lower for the forced-choice scales. It follows^ therefore ^ 
that if the reliabilities of the forced-choice scales could be experimentally 
raised to equal those of the true-false scales , the forced-choice scales 
might be expected to be more valid than the true-false scales. 

Analysis of Degree of Acquaintance 

Respondents were assigned two sets of behavior rating scores for each of 
the 20 traits 5 one based on the average ratings received by that respondent from 
all judges whose rating of degree of acquaintance for this subject was above the 
meeLn rating of degree of acquaintance; and the second score was the mean rating 
of the judges rating this subject as below the mean in degree of acquaintance. 
These scores were then correlated with the corresponding 20 scores for the PRF. 
The resulting sets of correlations represented the convergent validities of 
the PRF scores^ pemitting a comparison of their relative validity for the two 
levels of degree of acquaintance. For Group I (Foimi AA), I 8 of the 20 scales 
showed a higher PRF validity for judges high in degree of acquaintance (p < .01 
by sign test), while for Group II (Form C), 1? of the 20 scales shoved higher 
validities for the high degree of acquaintance Judges (p < •Ol), FBI icales 
were divided into two groups of 10 on the basis of their mean validities in the 







present study, and the average validity coerficient was plotted as a function 
of* degree of* acquaintance* Froufi Figure 1 it can he seen that the role of* 
degree of* acquaintance operates not only for scales showing substantial validity , 

Insert Figure 1 about here 

but for scales showing lesser validity, this trend being equally apparent in 
the two distinct groups which were administered different forms of a personality 
questionnaire. These results suggest strongly that Jud^nental accuracy varies 
as a function of degree of acquaintance, and they lend credence to the hypothesis 
that behavior ratings .are based on discriminant information about ratees. 

There is another way in which degree of acquaintance might operate to 
affect peer Judgments; by serving to simplify the factor structure of the mono- 
method correlations* The traits defining the PRF have generally not been 
found to intercorrelate excessively when measured by personality items. Never- 
theless, the trait ratings showed many high intercorrelations . When the entire 
set of ratings was intereorrelated, a total of Uo exceeded the rather high 
value of .60. However, when Just those raters indicating above average degree 
of acquaintance were separated for each subject, only 2 k correlations in the 
matrix exceeded *60, suggesting that the simplification that usually takes 
place in Judgments about personality seems to decrease when acquaintance is 
higher* This would seem to be at variance with the findings of Pass ini and 
Norman (1966), who found no greater differentiation among well-acquainted 
subjects , 

One further analysis was undertaken, namely, an investigation of the ex- 
tent to which ratings of degree of acquaintanee by individual Judges correlated 





- 12 - 



■with their ratings of substantive traits. Some rather dramatic correlations 
were uncovered, as, for example, a correlation of .73 between a rating of high 
degree of acquaintance and a rating of "sociable." The pattern of these cor- 
relations was such as to suggest that there was some systematic distortion in 
the ratings for substantive traits, depending upon the degree of acquaintance 
between the Judge and the ratee. The pattern of relationships seemed to 
suggest further that the distortion was marked for some types of traits, but 
not for others. Acquaintance ratings were associated with trait ratings 
representing Affiliation and Exhibition, for example, but not Achievement. In 
order to test the hypothesis that such findings might be linked to the degree 
to which Judges may tend to overestimate the presence of salient traits in 
individuals they know well, we separated the personality dimensions into two 
groups: those within the PRP correlating highly with the dominance scale, 

and those correlating less highly or negatively with dominance. These were 
designated salient and nonsalient traits, respectively. It should be noted 
that this separation, being based upon PRF intercorrelations, was entirely , 
independent of the results obtained with the trait ratings. 

Figure 2 presents the rather dramatic relationship between the salience 
of traits and their correlation with degree of acquaintance. It appears that 

Insert Figure 2 about here 

Judges are very prone to attribute characteristics linked to sociability, play, 
dominance, impulsivity , and even thrill— seeking to individuals whom they know 
well, and to attribute the lack of these, or their opposites, to individuals 
whom they know less well. It is tempting to speciilate that the causation might 
go the other way; that assertive individuals might be more likely to be well 
known. However, it should be remembered that these results pertain to every 




-13- 



subject, and that the judges, not the subjects, have been distinguished in terms 
of degree of acquaintance, with scores for every subject based on the two sets 
of judges differentiated in terms of their acquaintance with him. Indeed, the 
mean degree of acquaintance for a particular subject was found to possess 
generally low correlations with heteromethod information about personality 
traits . 

Discussion 

Some imporLont results emerged from the present study. Validities for 
Formi AA and Form C were very similar, while scales on Form AA had higher re- 
liabilities than did the corresponding scales on Form C. Form AA was found 
to be superior to Form C in predicting self-ratings. 

This study was different from other studies in that the true-false and 
forced-choice scales being compared were composed of identical self-descriptive 
statements, while most previous studies have compared scaleii composed of dif- 
ferent statements (Scott, 1968). Furthermore, unlike most previous studies, 
the present study examined scales covering a broad range of personality dimen- 
sions * 

Scales on Form AA consistently showed higher reliabilities th^ the cor- 
responding scales on Form C . The relatively lower reliabilities on the forced- 
choice form might be due to the fact that when two highly reliable statements 
with a^ost identical popularities are paired, a subject may choose one of 
these because of the salience of one statement or because of rejection of the 
other. For example, if an affiliation and an achievement statement were paired, 
a subject may have chosen the affiliation statement either because she con- 
sidered the affiliation statement to be particularly self-descriptive or because 

la 



o 

ERIC 



.i - 



-lU- 



she wished to avoid endorsing the achievement item. Therefore 5 the decision 
to choose or not choose the one alternative may be based at least partially 
upon irrelevant considerations , namely, the presence or absence of a second 
trait. Thus 5 if a subject in the exanple endorsed an affiliation itemj not 
because of her level of affiliation but primarily because she wished to avoid 
endorsing an achievement item^ this would add to the unreliability of the 
affiliation scale. The systematic pairing of reliable items from diverse 
scales would thus tend to attenuate the reliability of each scsiLe . Of course, 
the procedure of reqLUiring only one response to yield infoamiation about two 
items also reduces the reliability by essentially halving the nimber of item 
responses . 

An alternative strategy for constructing forced-choice items would be to 
pair each statement with an irrelevant filler statement, but this would require 
twice as many statements as are contained in the true-false version, an ex- 
tremely inefficient procedure. Yet another strategy would involve pairing two 
oppositely -keyed items from the same scaJ^e (Jackson & Minton, 1963) • But this 
strategy, while avoiding acquiescence bias, would not ordinarily permit the 
incorporation of the major presiimed advantage of the forced-choice procedure, 
namely, its suppression of favorability or communality bias. This is the case 
because it is not possible for most personality traits to develop item pools 
with syrunetric distributions of desirability or popularity values around a 
neutral point for positively axid negatively keyed items. 

It was mentioned that the validity of the forced-choice scales might be 
improved if their reliabilities could be improved. It should be clear, however, 
that experiment aJ-ly increasing the reliability of the forced-choice scales would 
be fraught with practical difficulties. This is not to say that the finding is 




16 



- 15 - 



not important. The forced-'Choice scales may indeed be more valid than the 
true-false scales in conditions where other factors might lower the reliabil- 
ities of the true-false scales; for example, in situations where subjects are 
prone to acquiesce. Another situation where true-false scales cotild be ex- 
pected to be less reliable than forced-choice scales Is in the measurement of 
psychopathological traits where the expected endorsement proportions of true- 
false items are very small or very large. in such cases where the popularities 
are extreme, the restricted item variance attenuates reliability. However, 
when such items are paired on popularity, the expected popularity of the re- 
sulting forced-choice item is .50. But it may not make good psychological 
sense to force highly skewed distributions into a normal distribution. At 
the item level pairing items from a hallueination scale and from a delusion 
scale would force a respondent to endorse one of these even if these disposi- 
tions were absent in his behavior. 

Previous studies have found the true— false format to be more susceptible 
to desirability bias than the forced— choice format , a fact which may account 
for the higher correlations between Foim AA seales and their respective self- 
ratings than the correlations between Form C scales and their respective self- 
ratings. Desirability bias was probably operating in the self-ratings of both 
samples. It is possible that desirability bias was operating in a similar 
manner in Form AA, while not operating,, or operating to a lesser degree, in 
Form C. Thus desirability bias could account for the higher self-rating 
validities of the true— false scales* 

The implications of the results bearing on degree of acquaintance are 
important. Degree of acquaintance in studies utilising behavior ratings or 
peer Judgments is a variable of critical importance, both for understanding the 




- 16 ^ 



acciiracy of these judgments , and for identifying a form of systematic bias in 
these judgments. This bias creates a distortion causing Judges to ascribe 
certain kinds of traits to ratees with whom they are well acquainted. This 
form of method variance tends to be specifiG to judgments , and might therefore 
ultimately be useful as a suppressor variable j should validities be high 
enough to warrant the use of suppressors . 

In conclusion, in the absence of clearcut evidence for superior proper- 
ties for scales using one or the other item format, decisions must be based on 
other considerations such as the simplicity and the nonipsative nature of the 
true-false form^ Thus, the true«false form will likely be the method of 
choice for some time to come. 





- 17 " 



References 

BorisloWj B. The Edwards Personal Preference Schedule (EPPS) and fakahility. 
Journal of A-pplied Psychology , IpSS^ 4^, 22-27 • 

Campbellj D. T, ^ & Fiske^ D. Convergent and discriminanl: validation by 

the mnlti trait -multimethod matrix. Psychological Bulletin , 1959s 3t , 

81-105 . 

Camphellj D. T. , Miller, N, , Lubetsky, 8 c O'Connell, E. J. Varieties of 

projection in trait attribution. Psychological Monographs , 1964 5 78 , 

No, 15 (Whole No, 592), l»-33. 

Corah, N. L. , reldman, M. J,, Cohen, I, S, , Gruen, W, , Meadow, A., 8 c Ringwall, 

E. A, Social desirability as a variable in the Edwards Personal Preference 
Schedule, Journal of Consulting Psychology , 1958, 22 , 70-72, 

Edwards 5 A, L, The relationship between the Judged desirability of a trait 

and the probability that the trait will be endorsed. Journal of Applied 
Psychology , 1953, 31, 90^93. 

Edwards 5 A. L, Manual for the Edwards Personal Preference Schedule . New York: 
Psycliological Corporation, 1954. 

Edwards , A , L , The social desirability variable in personality assessment and 
research . New York: Dryden, 1957, 

Edwards, A. L. , Wright, C, E,, 8 c Lunneborg, C. E, A note on "Social desir- 
ability as a variable in the Edwards Personal Preference Schedule," 

Journal of Consulting Psychology , 1959, £3, 558 • 

Feldman, M. J., & Corah, N, L. Social desirability exid the forced-choice 
method. Journal of Consulting Psychology , I960, gi, 480-482, 




Gordon, L. V. Validation of the forced-ehoiee and the^ questionnaire raethods of 
personality measurement. Journal of Applied Psychology ^ 1951; 407-4l2. 



-18- 



Heinemarij C. E, A forced-choice form of the Taylor Anxiety Scale. Journal of 
Consulting Psychology , 1953, 

Izards C. E., & Rosenberg, N. Effectiveness of a forced-choice leadership 

test under varied experimental conditions. Educational and Psychological 
Measurement ^ 195S> 1®.5 5T-62. 

Jackson, D. K. Personality Research Fozm . Goshen, Kew York: Research 

Psychologists Press, 1967- 

Jackson, D. E. , & Guthrie, G. M. Multitrait -multimethod evaluation of the 

Personality Research Form. In Proceedings of the 76 th Annual Convention 
of the i^erican Psychological Aseoclatlon , 1968 , 177-178. 

Jackson, D. R. , 8c Minton, H. A forced-choice adjective preference scale for 
personality assessment. Psychological Reports , 1963, 12 , 515-520, 

Jackson, D. N., & Payne, I, R, Personality scale for shadlov affect. 
Psychological Reports , 1963 , 13.5 687 ^ 698 . 

Krug, R- E. The effect of specific selection sets on a forced^choice self- 
description inventory. Journal of Applied Psychology , 1958, 89 - 92 . 

Kusyszyn, I., 8 e Jackson, D. R. A multimethod factor analytic appraisad of 

endorsement and judgment methods in personality assessment. Educational 
and Psychology cal Measurement , 1968, 2^, 10^7“1060. 

La Pointe, R. E. , & Auclair, G. A, The use of social desirability in forced- 
choice methodology, Anerican Psychologist ^ 1961 5 l^s kk6^ (Abstract) 

Longstaff, H, P, , & Jurgensen, C. E. Fakability of the Jurgensen Classifica- 
tion Inventory. Journal of Applied Psychology , 1953, 37_» 86-“89 . 

Magnusson, D, Test theory . Reading, Mass.: Addison-^Wesley , 1967* 




20 



- 19 - 



Maher^ H. Studies of transparency in forced-choice scales: I. Evidence of 

transparency^ Journal of Applied PsycholQ^y a 1959^ 1^3? 275-278^ 

Mals, R. D- Fakahility of the Classification Inventory scored for self-con- 
fidence. Journal of Applied Psychology , 1951, ^5 172-174. 

Murray 3 H, A. Explorations in personality . Caiii'bridge ^ Mass.: Oxford 

University Press, 1938. 

Weill, J. A . 5 Ss Jackson^ D. W. An empirical evaluation of item selection 

strategies in personality scale development . Educational and Psychological 
Measurement , 1970, ^0, 647-661. 

Norman, W. T. Personality measurement, faking, and detection: an assessment 

method for use in personnel selection. Journal of Applied Psychology , 

1963 , ^5 225 - 241 . 

Passini 3 F. T. , & Norman, W. T. A universal conception of personality 

structure? Journal of Personality and Social Psychology ^ 1966, _4, 44-49. 
Radcliffe, J. A. Some properties of ipsatlve score matrices and their rele- 
vance for some current interest tests. Australian Journal of Psychology , 

1963 . 15. 1-11. 

Radcliffe, J, A, Review of Edwards Personal Preference Schedule, In 0 . K, 

Buros (Ed.), The Sixth Mental Measurements Yearbook . Highland Park, 

New Jersey: Gryphon Press, 1965* Pp. 195-200, 

Rusmore , J. T. Fakahility of the Gordon Personal Profile* Journal of Applied 
Psychology , 1956, 175-^177- 

Saltz, E. , Reece, M. , & Ager , J. Studies of forced-choice methodology: indi- 

vidual differences in social desirahility , Educational and Psychological 
Measurement , I 962 , 22^, 365-370- 



- 20 - 



Scott 5 W- A, Comparative validities of forced-choice and single-stimulus 
tests. Psychological Bulletin ^ 1968 ^ 7^? 231-244. 

Strieker, L. J. Review of Edwards Personal Preference Schediile * In 0. K. 
Buros (Ed,), The Sixth Mental Measiirements Yearbook . Highland Park, 

New Jersey: Gryphon Press, 1965^ Pp- 200-207, 

Waters, L. K, , & Wherry, R* J, , Jr. The effect of intent to bias on forced- 
choice indices. Personnel Psychology ^ 1962, 207-2l4, 

Winters, S., Bartlett, C, J, , Leve , R. Instructional and response style 
factors with forced-choice response. Paper presented at the meetings 
of the American Psychological Association, Chicago, Illinois, 1965* 




22 



- 21 =- 



Footnotes 



^Reprints are olDtainable from Douglas N, Jackson 5 Department of Psychology 5 

University of Western Ontario ^ London 72 ^ Ontario ^ Canada. 

2 - - 
Supported in part from Research Grant No. 397 fnom the Ontario Mental 

Health Foundation. This paper was completed when Douglas W. Jackson was 

Visiting Scholar , Personality and Social Behavior Research Groups Division of 

Psychological Studies , Educational Testing Service ^ Princeton, New Jersey* 






ERIC 



23 



- 22 - 



Table 1 





Reliability 


and Validity 


of True- 


-PalS‘ , 








end 


Forced-Choice Foms 








Scale 


True-False (W-98) 


Forced- 


“Choice 


( N- 118 ) 




KR-20 


Validity 
Self" Peer 


Iffi-20 


Validity 
Self- Peer 






Ratings 


Ratings 




Ratings 


Rati Hi 


Abasement 


58 


32 


-05 


47 


19 


10 


Achi evement 


77 


61 


45 


44 


39 


46 


Affiliation 


75 


63 


37 


54 


45 


23 


Aggression 


71 


43 


21 


42 


37 


21 


Autonomy 


67 


48 


4l 


53 


36 


32 


Change 


69 


4l 


09 


39 


22 


20 


Cognitive Structure 


76 


24 


10 


40 


36 


06 


Defendence 




33 


13 


42 


27 


29 


Dominance 


79 


48 


4i 


68 


42 


54 


Endurance 


73 


46 


24 


46 


28 


19 


Exhibition 


75 


50 


42 


61 


38 


45 


Ham Avoidance 


8U 


55 


31 


71 


52 


46 


Impulslvity 


71 


57 


34 


47 


31 


30 


Nurturance 


72 


50 


17 


56 


35 


11 


Order 


86 


78 


54 


66 


63 


50 


Play 


66 


56 


29 


49 


34 


21 


Sentience 


j+4 


28 


17 


54 


32 


l4 


Social Recognition 


81 


53 


18 


6? 


49 


20 


Succorance 


72 


45 


43 


61 


43 


18 


Understanding 


60 


31 


17 


51 


39 


09 


Note : —Decimals 


have been omitted. For 


Form AA, 


the ,05 and 


.01 significance 



levels are .20 and ,26, respectively; for Form C they are .18 and .23, respectively 




34 



» 23 “ 




AVERAGE VALIDITY OF SCALES 



Fig, 1, Relation of degree of acquaintance 

of rates to behavior rating validity 
of personality scales . 



ERIC 



25 



=24- 




SALIENCE OF TRAITS 



Flg^ 2,. Relation of rated degree of acquaintance 
with ratings for salient and nonsalient 
personality traits « 



i 

i 

i 

) 



ERIC 



26 



